A Flexible, Scalable Finite-state Transducer Architecture for Corpus-based Concatenative Speech Synthesis1

نویسندگان

Jon R. W. Yi

James R. Glass

چکیده

In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synthesis costs into a constraint kernel, we have obtained a topology that scales linearly with the size of the synthesis corpus. The FST representation provides a flexible, unified framework in which we can leverage our previous work in speech recognition in areas such as pronunciation modelling and search. The FST synthesizer has been incorporated into two servers which operate within our conversational system architecture to convert meaning representations into waveforms. We have had preliminary success with the new FST-based synthesis in several constrained spoken dialogue applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis

متن کامل

Joint prosody prediction and unit selection for concatenative speech synthesis

In this paper we describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effectively expanding the space of possible prosodic targets. We implemented a symbolic pr...

متن کامل

Preliminary Evaluations of a WFST Speech Decoder

In this paper we present preliminary evaluations on the large vocabulary speech decoder we are currently developing at Tokyo Institute of Technology. Our goal is to build a scalable and flexible decoder to operate on weighted finite state transducer (WFST) search spaces. Even though the development of the decoder is still in its infancy we are already achieving good accuracy and speed on a larg...

متن کامل

Architectures for Speech-to-Speech Translation Using Finite-state Models

Speech-to-speech translation can be approached using finite state models and several ideas borrowed from automatic speech recognition. The models can be Hidden Markov Models for the accoustic part, language models for the source language and finite state transducers for the transfer between the source and target language. A “serial architecture” would use the Hidden Markov and the language mode...

متن کامل

Unit selection for speech synthesis using splicing costs with weighted finite state transducers

In this paper we describe how unit selection for concatenative speech synthesis can be implemented efficiently for sub-phonetic units using weighted finite state transducers (WFST). We also introduce splicing costs as a measure to indicate which unit boundaries are particularly good or poor joint points. Splicing costs extend the flexibility offered by the unit selection paradigm. Through a per...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2000

A Flexible, Scalable Finite-state Transducer Architecture for Corpus-based Concatenative Speech Synthesis1

نویسندگان

چکیده

منابع مشابه

A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis

Joint prosody prediction and unit selection for concatenative speech synthesis

Preliminary Evaluations of a WFST Speech Decoder

Architectures for Speech-to-Speech Translation Using Finite-state Models

Unit selection for speech synthesis using splicing costs with weighted finite state transducers

عنوان ژورنال:

اشتراک گذاری